The purpose of the research is to assess the results of an A/B test of the improved recommender system.
Objectives:
Technical specifications:
recommender_system_test;product_page event,product_cart,purchase.The research process:
# Import necessary libraries
import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly import graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats as st
import math as mth
Load the datasets and save them in the corresponding variables.
# 2020 marketing event calendar
try:
marketing_events = pd.read_csv('/datasets/ab_project_marketing_events.csv', sep=',')
except:
marketing_events = pd.read_csv('ab_project_marketing_events.csv', sep=',')
# users who registered from December 7th to December 21st, 2020
try:
new_users = pd.read_csv('/datasets/final_ab_new_users.csv', sep=',')
except:
new_users = pd.read_csv('final_ab_new_users.csv', sep=',')
# new user actions from December 7th, 2020 to January 4th, 2021
try:
events = pd.read_csv('/datasets/final_ab_events.csv', sep=',')
except:
events = pd.read_csv('final_ab_events.csv', sep=',')
# test participants table
try:
participants = pd.read_csv('/datasets/final_ab_participants.csv', sep=',')
except:
participants = pd.read_csv('final_ab_participants.csv', sep=',')
Let's look at the first rows of each dataset.
names = ['marketing_events', 'new_users', 'events', 'participants']
df = [marketing_events, new_users, events, participants]
for i in range(len(df)):
print('Dataset name:', names[i])
display(df[i].head())
print()
Dataset name: marketing_events
| name | regions | start_dt | finish_dt | |
|---|---|---|---|---|
| 0 | Christmas&New Year Promo | EU, N.America | 2020-12-25 | 2021-01-03 |
| 1 | St. Valentine's Day Giveaway | EU, CIS, APAC, N.America | 2020-02-14 | 2020-02-16 |
| 2 | St. Patric's Day Promo | EU, N.America | 2020-03-17 | 2020-03-19 |
| 3 | Easter Promo | EU, CIS, APAC, N.America | 2020-04-12 | 2020-04-19 |
| 4 | 4th of July Promo | N.America | 2020-07-04 | 2020-07-11 |
Dataset name: new_users
| user_id | first_date | region | device | |
|---|---|---|---|---|
| 0 | D72A72121175D8BE | 2020-12-07 | EU | PC |
| 1 | F1C668619DFE6E65 | 2020-12-07 | N.America | Android |
| 2 | 2E1BF1D4C37EA01F | 2020-12-07 | EU | PC |
| 3 | 50734A22C0C63768 | 2020-12-07 | EU | iPhone |
| 4 | E1BDDCE0DAFA2679 | 2020-12-07 | N.America | iPhone |
Dataset name: events
| user_id | event_dt | event_name | details | |
|---|---|---|---|---|
| 0 | E1BDDCE0DAFA2679 | 2020-12-07 20:22:03 | purchase | 99.99 |
| 1 | 7B6452F081F49504 | 2020-12-07 09:22:53 | purchase | 9.99 |
| 2 | 9CD9F34546DF254C | 2020-12-07 12:59:29 | purchase | 4.99 |
| 3 | 96F27A054B191457 | 2020-12-07 04:02:40 | purchase | 4.99 |
| 4 | 1FD7660FDF94CA1F | 2020-12-07 10:15:09 | purchase | 4.99 |
Dataset name: participants
| user_id | group | ab_test | |
|---|---|---|---|
| 0 | D1ABA3E2887B6A73 | A | recommender_system_test |
| 1 | A7A3664BD6242119 | A | recommender_system_test |
| 2 | DABC14FDDFADD29E | A | recommender_system_test |
| 3 | 04988C5DF189632E | A | recommender_system_test |
| 4 | 482F14783456D21B | B | recommender_system_test |
In the regions column of the 'marketing_events' dataset, multiple names of event regions may be stored.
We will check the data types, as well as duplicates and missing values, using the 'info' method.
names = ['marketing_events', 'new_users', 'events', 'participants']
df = [marketing_events, new_users, events, participants]
for i in range(len(df)):
print('Dataset name:', names[i])
print()
print(df[i].info())
print('Duplicates:', df[i].duplicated().sum())
print('Missing values:')
print(df[i].isna().sum())
print()
Dataset name: marketing_events <class 'pandas.core.frame.DataFrame'> RangeIndex: 14 entries, 0 to 13 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 14 non-null object 1 regions 14 non-null object 2 start_dt 14 non-null object 3 finish_dt 14 non-null object dtypes: object(4) memory usage: 576.0+ bytes None Duplicates: 0 Missing values: name 0 regions 0 start_dt 0 finish_dt 0 dtype: int64 Dataset name: new_users <class 'pandas.core.frame.DataFrame'> RangeIndex: 61733 entries, 0 to 61732 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 61733 non-null object 1 first_date 61733 non-null object 2 region 61733 non-null object 3 device 61733 non-null object dtypes: object(4) memory usage: 1.9+ MB None Duplicates: 0 Missing values: user_id 0 first_date 0 region 0 device 0 dtype: int64 Dataset name: events <class 'pandas.core.frame.DataFrame'> RangeIndex: 440317 entries, 0 to 440316 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 440317 non-null object 1 event_dt 440317 non-null object 2 event_name 440317 non-null object 3 details 62740 non-null float64 dtypes: float64(1), object(3) memory usage: 13.4+ MB None Duplicates: 0 Missing values: user_id 0 event_dt 0 event_name 0 details 377577 dtype: int64 Dataset name: participants <class 'pandas.core.frame.DataFrame'> RangeIndex: 18268 entries, 0 to 18267 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 18268 non-null object 1 group 18268 non-null object 2 ab_test 18268 non-null object dtypes: object(3) memory usage: 428.3+ KB None Duplicates: 0 Missing values: user_id 0 group 0 ab_test 0 dtype: int64
In all datasets, the date type is 'object', and it needs to be changed to 'datetime'. There are no duplicates. There are missing values in the details column of the 'events' dataset.
# we will change the format of the columns with dates in all datasets
marketing_events['start_dt'] = marketing_events['start_dt'].map(lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))
marketing_events['finish_dt'] = marketing_events['finish_dt'].map(lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))
new_users['first_date'] = new_users['first_date'].map(lambda x: dt.datetime.strptime(x, '%Y-%m-%d'))
events['event_dt'] = events['event_dt'].map(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
events['event_dt'] = events['event_dt'].dt.normalize()
events['new_event_dt'] = pd.to_datetime(events['event_dt']).dt.date
Let's look at the missing values in the details column of the 'events' dataset. From the documentation, we know that the details column contains additional information about the event. For example, the cost of the purchase in dollars is stored in this field for the event 'purchase'.
# we will group the dataset by the event name and count the filled 'details' columns
events.groupby('event_name').agg(event_count=('details', 'count')).reset_index()
| event_name | event_count | |
|---|---|---|
| 0 | login | 0 |
| 1 | product_cart | 0 |
| 2 | product_page | 0 |
| 3 | purchase | 62740 |
The missing values occur in the events 'login', 'product_cart', and 'product_page'. This means that details are not collected for these events. We will leave the missing values as they are.
We have four datasets:
There are no duplicates, and there are missing values in the details column of the 'events' dataset, which we will leave unprocessed: details are not collected for all events.
The format of the columns with date information has been changed from 'object' to 'datetime'.
# checking the start and stop dates for user recruitment
print('Start date for new user recruitment:', new_users.first_date.min())
print('Stop date for new user recruitment:', new_users.first_date.max())
Start date for new user recruitment: 2020-12-07 00:00:00 Stop date for new user recruitment: 2020-12-23 00:00:00
New user recruitment was stopped two days later than the deadline set in the Technical Specification.
# checking the start and stop dates of the test
print('Test start date:', events.event_dt.min())
print('Test stop date:', events.event_dt.max())
Test start date: 2020-12-07 00:00:00 Test stop date: 2020-12-30 00:00:00
The test was stopped on December 30th, 5 days earlier than the deadline set in the Technical Specification. This means that not all users had time to "live" the 14-day testing period.
We are checking if the proportion of test participants from the European Union from all new users from this region who were registered on the resource during the user recruitment period from December 7th to December 21st, is 15 percent.
To determine the region of test participants, we will add information from new_users to the test_participants dataset. We will only keep users from the European Union.
# combining the 'participants' and 'new_users' datasets into the 'test_participants' variable
test_participants = pd.merge(participants, new_users).query('region == "EU"')
# we only keep those users who accessed the website during the recruitment days for the test
test_participants = test_participants.query('"2020-12-07" <= first_date < "2020-12-21"')
# counting all EU users registered on the dates specified in the Technical Specifications who are in the test
test_participants_eu = test_participants.query('ab_test == "recommender_system_test"')['user_id'].nunique()
# counting the number of new European users registered on the dates specified in the Technical Specifications
new_users_eu = new_users.query('region == "EU" and "2020-12-07" <= first_date < "2020-12-21"')['user_id'].nunique()
print("The proportion of users from the EU region among test participants:", round(test_participants_eu * 100 / new_users_eu, 1))
The proportion of users from the EU region among test participants: 15.0
The test participants from the European Union made up 15%, which corresponds to the technical requirement.
print('Actual number of test participants:', test_participants_eu)
print(f'The proportion of actual participants to the plan: {round(test_participants_eu * 100 / 6000)}%')
Actual number of test participants: 5668 The proportion of actual participants to the plan: 94%
Check if there are any coincidences with marketing and other activities.
marketing_events.query('start_dt >= "2020-12-07"')
| name | regions | start_dt | finish_dt | |
|---|---|---|---|---|
| 0 | Christmas&New Year Promo | EU, N.America | 2020-12-25 | 2021-01-03 |
| 10 | CIS New Year Gift Lottery | CIS | 2020-12-30 | 2021-01-07 |
In the last 5 days of the test in the European Union and North America, there was Christmas&New Year Promo. We wil consider this information in assessing the test results.
Check if there are any intersections with a competing test, as well as users participating in two test groups at the same time. Check the uniformity of distribution to the test groups and the correctness of their formation.
# checking which tests users participated in
participants['ab_test'].unique()
array(['recommender_system_test', 'interface_eu_test'], dtype=object)
Users also participated in another test — interface_eu_test. Let's check if there are users intersections between the two tests and the recommender_system_test test groups.
# grouping test participants by user_id and selecting those who participated in two tests, counting the number
intersection = (participants.groupby('user_id') \
.agg(tests_count=('ab_test', 'count')) \
.reset_index() \
.sort_values(by='tests_count', ascending=False) \
.query('tests_count == 2'))
intersection.shape[0]
1602
Found intersections between tests: 1,602 users participated in two tests.
# saving the IDs of users who participated in both tests in the 'intersection_users' variable
intersection_users = list(intersection['user_id'])
# saving the users who participated only in the 'recommender_system_test' test in the 'test_participants' dataset
test_participants = test_participants[~test_participants.user_id.isin(intersection_users)].query('ab_test == "recommender_system_test"')
# checking if there are users who belong to both test groups
test_participants.groupby('user_id') \
.agg(groups_count=('group', 'count')) \
.reset_index() \
.sort_values(by='groups_count', ascending=False) \
.head()
| user_id | groups_count | |
|---|---|---|
| 0 | 000ABE35EE11412F | 1 |
| 2836 | A990335135D63779 | 1 |
| 2822 | A8514C09C055B131 | 1 |
| 2823 | A851E8F00A4F1253 | 1 |
| 2824 | A880134551CFF8B7 | 1 |
There are no intersections among the test groups.
# updating the number of participants
print("The number of test participants: ", test_participants.shape[0])
The number of test participants: 4245
The test was stopped on December 30th prematurely and not all users had the opportunity to perform events within 14 days. Let's see how it will affect the test results.
test_participants.groupby('group').agg(users=('user_id', 'nunique')).reset_index()
| group | users | |
|---|---|---|
| 0 | A | 2425 |
| 1 | B | 1820 |
There are more users in the control group.
print('The actual number of test participants:', test_participants.shape[0])
print(f'The actual ratio of participants to the plan: {round(test_participants.shape[0] * 100 / 6000)}%')
The actual number of test participants: 4245 The actual ratio of participants to the plan: 71%
Let's check the EU users' ratio again.
print("The proportion of users from the EU region among test participants:", round(test_participants.shape[0] * 100 / new_users_eu, 1))
The proportion of users from the EU region among test participants: 11.3
The EU users' ratio among test participants has decreased to 11.3%. 15% was claimed in the Technical Specification.
Let's prepare data for the research analysis.
# let's store in a variable 'data' the information about all the events of the filtered users
data = pd.merge(test_participants, events, how='left')
# checking data types and missing values
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 17323 entries, 0 to 17322 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 17323 non-null object 1 group 17323 non-null object 2 ab_test 17323 non-null object 3 first_date 17323 non-null datetime64[ns] 4 region 17323 non-null object 5 device 17323 non-null object 6 event_dt 15322 non-null datetime64[ns] 7 event_name 15322 non-null object 8 details 2024 non-null float64 9 new_event_dt 15322 non-null object dtypes: datetime64[ns](2), float64(1), object(7) memory usage: 1.5+ MB
The missing values in the date and event name columns indicate that there are users who have not taken any actions. We will replace the missing values in the event_name column with "visit" and in the details column with zero values.
data['event_name'] = data['event_name'].fillna(value='visit')
data['details'] = data['details'].fillna(0)
We will also verify that the dates of the events match the dates of the test.
print(data.event_dt.min())
print(data.event_dt.max())
2020-12-07 00:00:00 2020-12-30 00:00:00
For each user, we also need to take into account only those events that occurred within the first 14 days after registration.
# adding to the dataframe a 'days' column with the number of days from registration to the event
data['days'] = (data['event_dt'] - data['first_date']).dt.days
# filling in missing values with zeros
data['days'] = data['days'].fillna(0)
# sorting out events that occurred later than 14 days after the user's registration
data = data.query('days <= 14')
# the missing values in the event date column (for users who did not perform events)
# will be filled with the website visit date
data['event_dt'] = np.where(data['event_dt'].isna(), data['first_date'], data['event_dt'])
The test deviated from the technical specification:
The enrollment of new users was stopped two days later than the deadline set in the specification, on December 23 instead of December 21.
The test was stopped on December 30, five days earlier than the deadline set in the specification, meaning not all users completed events over the 14-day testing period.
The proportion of test participants from the European Union was 11.3%, which is less than what was specified in the specification (15%).
After removing users who participated in two tests, 4,245 users remained, 71% of the planned 6,000.
In group B, there were 1,820 users, and in the control group A, there were 2,425 users.
During the last 5 days of the test, a marketing event "Christmas & New Year Promo" took place in the European Union and North America, which may have affected the test results.
The data was prepared for exploratory data analysis.
plt.style.use('seaborn')
groups = ["A", "B"]
for i in range(len(groups)):
a = groups[i]
df = data.query('group == @a').groupby('user_id').agg(events_number=('event_name','count'))
print(f'Average event count per user in group {a}:', round(df['events_number'].mean(), 1))
df = df.query('events_number > 1')
plt.figure(figsize=(10,5))
sns.histplot(data = df, bins=27)
plt.title('Histogram of event count distribution per user in sample ' + a ,fontsize=15)
plt.ylabel('Number of users',fontsize=12)
plt.xlabel('Event count per user',fontsize=12);
Average event count per user in group A: 5.1 Average event count per user in group B: 2.5
In both samples, the maximum number of users had no events. In sample B, more than 800 users had no events, despite the fact that this sample was smaller than A.
The distribution of users who have performed at least one event is more likely to be normal with a shift to the left.
The average event count per user in group A is 5.1, in group B — 2.5.
groups = ["A", "B"]
for i in range(len(groups)):
a = groups[i]
df = data.query('group == @a')['event_dt']
plt.figure(figsize=(10,5))
sns.histplot(data = df)
plt.title('Histogram of event count distribution by day in sample ' + a ,fontsize=15)
plt.ylabel('Event count',fontsize=12)
plt.xlabel('Events per day',fontsize=12)
plt.xticks(rotation=90);
В контрольной группе пик количества событий приходится на начало второй недели, затем количество событий постепенно снижается. В группе В два пика событий — начало первой недели и начало второй недели, затем также идет снижение.
Let's look at the events that make up the funnel.
data.event_name.unique()
array(['purchase', 'product_cart', 'product_page', 'login', 'visit'],
dtype=object)
data.query('group == "A"').groupby('event_name').agg({'user_id':'nunique'})
| user_id | |
|---|---|
| event_name | |
| login | 1651 |
| product_cart | 514 |
| product_page | 1072 |
| purchase | 517 |
| visit | 774 |
In the dataset, there is a feature 'visit' which means that the user has entered the website but has not taken any action. Then, at the first stage of the funnel, we will have users with the events 'visit' and 'login', i.e. all visitors.
# grouping the data for group A funnel
funnel_A = data.query('group == "A"').groupby('event_name').agg(clients=('user_id', 'nunique')).reset_index()
# adding the number of visitors with the event 'visit' to the visitors with the event 'login'
funnel_A.loc[funnel_A['clients'] == funnel_A['clients'].iloc[4], 'clients'] = funnel_A['clients'].iloc[4] + funnel_A['clients'].iloc[0]
# grouping the data for group B funnel
funnel_B = data.query('group == "B"').groupby('event_name').agg(clients=('user_id', 'nunique')).reset_index()
# adding the number of visitors with the event 'visit' to the visitors with the event 'login'
funnel_B.loc[funnel_B['clients'] == funnel_B['clients'].iloc[4], 'clients'] = funnel_B['clients'].iloc[4] + funnel_B['clients'].iloc[0]
# forming the list of client events for funnel A
clients_A = []
for i in ['visit', 'login', 'product_page', 'product_cart', 'purchase']:
clients_A.append(int(funnel_A.loc[funnel_A['event_name'] == i, 'clients']))
# forming the list of client events for funnel B
clients_B = []
for i in ['visit', 'login', 'product_page', 'product_cart', 'purchase']:
clients_B.append(int(funnel_B.loc[funnel_B['event_name'] == i, 'clients']))
titles = ['Visits',
'Registrations',
'Page views',
'Cart visits',
'Payments',
]
fig = go.Figure()
fig.add_trace(go.Funnel(y = titles, x = clients_A, name="A", textinfo = "value+percent initial"))
fig.add_trace(go.Funnel(y = titles, x = clients_B, name="B", textinfo = "value+percent initial"))
fig.update_layout(coloraxis=dict(colorscale='Bluered_r'), showlegend=True,
title="Conversion funnels of test groups", legend_title="Group")
fig.show()
During the registration step, users in group A show a higher conversion rate to all visits — 68.1%, in group B — 32.5%.
In group A, the product page view conversion rate was 64.9% of the registrations. In group B, 55.7%.
In group A, 47.9% of those who viewed the product page went to the shopping cart page, in group B 51.5%.
Next, we see an anomaly in group A: there were more users who paid than visited the shopping cart. Some users may have the technical ability to pay bypassing the shopping cart (for example, in the app).
Also, according to the technical specification, we expect that the number of shopping cart views and purchases will increase by no less than 10% in group B. Since the groups differ in the number of users, we will look at the relative numbers:
The distribution of users who performed at least one event is more likely to be left-shifted normal.
In the control group, the peak in the number of events occurs at the beginning of the second week, then the number of events gradually decreases. In group B, there are two event peaks — at the beginning of the first week and the beginning of the second week, and then there is also a decrease.
At the registration step, users in group A show a higher conversion rate to all visits — 68.1%, in group B — 32.5%. In group A, the product page view conversion rate was 64.9% of registrations. In group B, it was 55.7%.
47.9% of those users, who visited product page, in group A went to the cart page, in group B 51.5%.
There is an anomaly in group A: there were more users who paid than visited the cart. Some users may have the technical ability to pay without the cart (for example, in the app).
Also, according to the technical specifications, we expect that the number of cart views and purchases in group B will increase by at least 10%. Since the groups differ in the number of users, we will look at the relative numbers:
To evaluate the results, we will create charts that display the dynamics of average check and conversion by groups.
For the charts, we will collect cumulative data. We will create dataframes 'cumulativeRevenueA' and 'cumulativeRevenueB' with columns:
# we select only purchases, group by event date and group, count orders and revenue
cumulative_data = data.query('event_name == "purchase"').groupby(['event_dt', 'group'], as_index=False).agg(orders=('event_name', 'count'), revenue = ('details', 'sum'))
cumulativeRevenueA = cumulative_data[cumulative_data['group']=='A'][['event_dt','orders', 'revenue']]
cumulativeRevenueB = cumulative_data[cumulative_data['group']=='B'][['event_dt','orders', 'revenue']]
# calculating the cumulative sum in the 'orders' and 'revenue' columns
cumulativeRevenueA[['orders', 'revenue']] = cumulativeRevenueA[['orders', 'revenue']].cumsum(axis = 0)
cumulativeRevenueB[['orders', 'revenue']] = cumulativeRevenueB[['orders', 'revenue']].cumsum(axis = 0)
We build a chart of the dynamics of cumulative average check by groups.
f = plt.figure()
f.set_figwidth(12)
f.set_figheight(6)
plt.title('Chart of cumulative average check by groups',fontsize=17)
plt.xlabel('Days',fontsize=12)
plt.ylabel('Cumulative average check, $',fontsize=12);
plt.plot(cumulativeRevenueA['event_dt'], cumulativeRevenueA['revenue']/cumulativeRevenueA['orders'], label='A')
plt.plot(cumulativeRevenueB['event_dt'], cumulativeRevenueB['revenue']/cumulativeRevenueB['orders'], label='B')
plt.legend();
From the chart, we can see that after two weeks of testing, the average check charts stabilized. The average check of group B is 4 dollars lower than the average check of group A.
Let's build a chart of the relative change in cumulative average check.
# combining the data of both groups
mergedCumulativeRevenue = cumulativeRevenueA.merge(cumulativeRevenueB, left_on='event_dt', right_on='event_dt', how='left', suffixes=['A', 'B'])
f = plt.figure()
f.set_figwidth(12)
f.set_figheight(6)
plt.title('Chart of relative change in cumulative average check of group B to group A',fontsize=17)
plt.xlabel('Days',fontsize=12)
plt.ylabel('The ratio of B to A cum. av. check',fontsize=12);
plt.plot(mergedCumulativeRevenue['event_dt'], (mergedCumulativeRevenue['revenueB']/mergedCumulativeRevenue['ordersB'])/(mergedCumulativeRevenue['revenueA']/mergedCumulativeRevenue['ordersA'])-1)
plt.axhline(y=0, color='black', linestyle='--');
After the chart stabilized, the average check of group B remained more than 15% lower than in group A.
We will create dataframes 'cumulativeConversionA' and 'cumulativeConversionB' with columns:
# grouping the data by the event date and group, calculating the number of users
cumulative_data_2 = data.groupby(['event_dt', 'group'], as_index=False).agg(visitors=('event_name', 'count'))
# merging with the table with the number of orders and revenue
cumulativeConversion = pd.merge(cumulative_data, cumulative_data_2)
cumulativeConversionA = cumulativeConversion[cumulativeConversion['group']=='A'][['event_dt','orders', 'visitors']]
cumulativeConversionB = cumulativeConversion[cumulativeConversion['group']=='B'][['event_dt','orders', 'visitors']]
# calculating the cumulative sum in the orders and revenue columns
cumulativeConversionA[['orders', 'visitors']] = cumulativeConversionA[['orders', 'visitors']].cumsum(axis = 0)
cumulativeConversionB[['orders', 'visitors']] = cumulativeConversionB[['orders', 'visitors']].cumsum(axis = 0)
Let's build a chart of the dynamics of cumulative conversion by groups.
f = plt.figure()
f.set_figwidth(12)
f.set_figheight(6)
plt.title('Cumulative conversion chart by groups',fontsize=17)
plt.xlabel('Days',fontsize=12)
plt.ylabel('Value of cumulative conversion',fontsize=12);
plt.plot(cumulativeConversionA['event_dt'], cumulativeConversionA['orders']/cumulativeConversionA['visitors'], label='A')
plt.plot(cumulativeConversionB['event_dt'], cumulativeConversionB['orders']/cumulativeConversionB['visitors'], label='B')
plt.legend();
The conversion charts stabilized after 10 days of testing. Conversion in group B is 9.2%, in group A is 12.6%.
Let's build a chart of the relative change in cumulative conversion.
# adding a 'conversion' column to the dataframes
cumulativeConversionA['conversion'] = cumulativeConversionA['orders']/cumulativeConversionA['visitors']
cumulativeConversionB['conversion'] = cumulativeConversionB['orders']/cumulativeConversionB['visitors']
# merging the dataframes
mergedCumulativeConversions = (
cumulativeConversionA[['event_dt','conversion']] \
.merge(cumulativeConversionB[['event_dt','conversion']], left_on='event_dt', right_on='event_dt', how='left', suffixes=['A', 'B']))
# building the chart
f = plt.figure()
f.set_figwidth(12)
f.set_figheight(6)
plt.title('Chart of the relative change in cumulative conversion of group B to group A',fontsize=17)
plt.xlabel('Days',fontsize=12)
plt.ylabel('The ratio of cumulative conversion of group B to group A',fontsize=12);
plt.plot(mergedCumulativeConversions['event_dt'], mergedCumulativeConversions['conversionB']/mergedCumulativeConversions['conversionA']-1, label="the relative change in cumulative conversion of group B to group A")
plt.legend()
plt.axhline(y=0, color='black', linestyle='--')
plt.axhline(y=0.1, color='grey', linestyle='--');
After chart stabilized, the relative conversion rate of group B was more than 20% lower than group A. According to the technical specification, an improvement in the metric of no less than 10% was expected.
According to the technical specifications, we expected an improvement in the conversion rate to view product cards of no less than 10%. We will compare the proportions of customers at all funnel steps.
Hypotheses
# function to check for statistical differences between proportions
def z_test(successes, trials):
alpha = .05 # critical level of statistical significance
# proportion of successes in the first group:
p1 = successes[0]/trials[0]
# proportion of successes in the second group:
p2 = successes[1]/trials[1]
# proportion of successes in the combined dataset:
p_combined = (successes[0] + successes[1]) / (trials[0] + trials[1])
# proportion difference in the datasets
difference = p1 - p2
# calculate the statistic in standard deviations of the standard normal distribution
z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials[0] + 1/trials[1]))
# define the standard normal distribution (mean 0, standard deviation 1)
distr = st.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print('p-value: ', p_value)
if p_value < alpha:
print('Reject the null hypothesis: there is a significant difference between the proportions')
else:
print('Could not reject the null hypothesis, there is no evidence to consider the proportions different')
# the loop to check the proportions at each conversion step with a z-test
steps_A = clients_A
steps_B = clients_B
funnel_events = ['Conversion to registration A and B:',
'Conversion to product card views A and B:',
'Conversion to cart views A and B:', 'Conversion to purchases A and B:']
i = 0
while i <= len(steps_A) - 2:
trials = np.array([steps_A[0], steps_B[0]])
successes = np.array([steps_A[i+1], steps_B[i+1]])
print(funnel_events[i])
print(trials)
print(successes)
z_test(successes, trials)
i += 1
print()
Conversion to registration A and B: [2425 1819] [1651 592] p-value: 0.0 Reject the null hypothesis: there is a significant difference between the proportions Conversion to product card views A and B: [2425 1819] [1072 330] p-value: 0.0 Reject the null hypothesis: there is a significant difference between the proportions Conversion to cart views A and B: [2425 1819] [514 170] p-value: 0.0 Reject the null hypothesis: there is a significant difference between the proportions Conversion to purchases A and B: [2425 1819] [517 167] p-value: 0.0 Reject the null hypothesis: there is a significant difference between the proportions
The hypothesis that there is no difference between the proportions was not confirmed at any stage: the difference between the proportions is significant.
After two weeks of testing, the cumulative average check graphs stabilized. The average check in group B remained more than 15% lower than in group A.
Conversion graphs stabilized after 10 days of testing. Conversion in group B was 9.2%, in group A 12.6%. The relative conversion of group B was more than 20% lower than group A. According to the technical specification, an improvement in the metric was expected by no less than 10%.
We tested the statistical difference of customer proportions at each stage of the conversion funnel using a z-test. The hypothesis that there is no difference between the proportions was not confirmed at any stage: the difference between the proportions is significant.
The distribution of users who have performed at least one event is more likely to be normally distributed with a shift to the left.
In the control group, the peak number of events occurs at the beginning of the second week, and then the number of events gradually decreases. In group B, there are two peaks of events — at the beginning of the first week and the beginning of the second week, and then a decrease.
According to the technical specifications, we expect that at least 10% of the number of cart views and purchases will increase in group B. The results of the test showed that we did not see this:
After two weeks of testing, the cumulative average check graphs stabilized. The average check in group B remained 15% lower than in group A.
The conversion graphs stabilized after 10 days of testing. Conversion in group B is 9.2%, in group A it's 12.6%. The relative conversion of group B was lower than group A by more than 20%. As per the requirements, the improvement of the metric was expected to be no less than 10%.
The time overlaping of the last 5 days of the test with the Christmas & New Year Promo marketing event did not affect the test results.
We tested the statistical difference of customer shares at each step of the conversion funnel using the z-test. The hypothesis that there is no difference between the shares was not confirmed at the stage of conversions from views to registration and from registration to product card views: the difference between the shares was significant.
The hypothesis that there is no difference between proportions was confirmed at the stage of product views to cart views and cart views to purchases conversions: there is no basis to consider the proportions different on these stages.
As a result of the test, we can conclude that the new recommendation system decreases conversion. However, the test was carried out with significant deviations from the technical specifications:
Therefore, it is recommended to restart the test.